Learning Bregman Distance Functions and Its Application for Semi-Supervised Clustering

نویسندگان

  • Lei Wu
  • Rong Jin
  • Steven C. H. Hoi
  • Jianke Zhu
  • Nenghai Yu
چکیده

Learning distance functions with side information plays a key role in many machine learning and data mining applications. Conventional approaches often assume a Mahalanobis distance function. These approaches are limited in two aspects: (i) they are computationally expensive (even infeasible) for high dimensional data because the size of the metric is in the square of dimensionality; (ii) they assume a fixed metric for the entire input space and therefore are unable to handle heterogeneous data. In this paper, we propose a novel scheme that learns nonlinear Bregman distance functions from side information using a nonparametric approach that is similar to support vector machines. The proposed scheme avoids the assumption of fixed metric by implicitly deriving a local distance from the Hessian matrix of a convex function that is used to generate the Bregman distance function. We also present an efficient learning algorithm for the proposed scheme for distance function learning. The extensive experiments with semi-supervised clustering show the proposed technique (i) outperforms the state-of-the-art approaches for distance function learning, and (ii) is computationally efficient for high dimensional data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Composite Kernel Optimization in Semi-Supervised Metric

Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...

متن کامل

Semi-supervised Kernel Metric Learning Using Relative Comparisons

We consider the problem of metric learning subject to a set of constraints on relative-distance comparisons between the data items. Such constraints are meant to reflect side-information that is not expressed directly in the feature vectors of the data items. The relative-distance constraints used in this work are particularly effective in expressing structures at finer level of detail than mus...

متن کامل

Probabilistic Semi-Supervised Clustering with Constraints

Unsupervised clustering can be significantly improved using supervision in the form of pairwise constraints, i.e., pairs of instances labeled as belonging to same or different clusters. In recent years, a number of algorithms have been proposed for enhancing clustering quality by employing such supervision. Such methods use the constraints to either modify the objective function, or to learn th...

متن کامل

A Kernel-Learning Approach to Semi-supervised Clustering with Relative Distance Comparisons

We consider the problem of clustering a given dataset into k clusters subject to an additional set of constraints on relative distance comparisons between the data items. The additional constraints are meant to reflect side-information that is not expressed in the feature vectors, directly. Relative comparisons can express structures at finer level of detail than must-link (ML) and cannot-link ...

متن کامل

Analysis of Semi-Supervised Learning with the Yarowsky Algorithm

The Yarowsky algorithm is a rule-based semisupervised learning algorithm that has been successfully applied to some problems in computational linguistics. The algorithm was not mathematically well understood until (Abney 2004) which analyzed some specific variants of the algorithm, and also proposed some new algorithms for bootstrapping. In this paper, we extend Abney’s work and show that some ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009